Goto

Collaborating Authors

 long step


Open Problem: Anytime Convergence Rate of Gradient Descent

arXiv.org Artificial Intelligence

Recent results show that vanilla gradient descent can be accelerated for smooth convex objectives, merely by changing the stepsize sequence. We show that this can lead to surprisingly large errors indefinitely, and therefore ask: Is there any stepsize schedule for gradient descent that accelerates the classic $\mathcal{O}(1/T)$ convergence rate, at \emph{any} stopping time $T$?


Provably Faster Gradient Descent via Long Steps

arXiv.org Artificial Intelligence

This work proposes a new analysis technique for gradient descent, establishing provably better convergence rates for smooth, convex optimization than the prior state-of-art textbook proofs. Our theory allows for nonconstant stepsize policies, periodically taking larger steps that may violate the monotone decrease in objective value typically needed by analysis. In fact, contrary to the common intuition, we show periodic long steps, which may increase the objective value in the short term, provably speed up convergence in the long term, with increasingly large gains as longer and longer steps are periodically included. This bears a similarity to accelerated momentum methods, which also depart from ensuring a monotone objective decrease at every iteration. Establishing this requires a proof technique capable of analyzing the overall effect of many iterations at once rather than the typical (naive) one-iteration inductions used in most first-order method analyses. Our proofs are based on the Performance Estimation Problem (PEP) ideas of [1-3], which cast computing/bounding the worst-case problem instance of a given algorithm as a Semidefinite Program (SDP). We show that the existence of a feasible solution to a related SDP proves a descent guarantee after applying a corresponding pattern of nonconstant stepsizes, from which faster convergence guarantees follow.